404 research outputs found

    Informative Bayesian Modeling With Applications to Media Data

    Get PDF
    This dissertation consists of three main parts. Each part develops an application or methodology within the Bayesian framework. The first is a study of multi-channel media consumption patterns for US audiences during the 2010 FIFA World Cup using a Bayesian data fusion strategy. We utilize the aggregated television ratings in the estimation, to incorporate additional data that is on a different scale than the individual-level on alternative media platforms. The second study proposes an information integration method, called the information reweighted prior (IRP) approach, to incorporate external information via prior distributions through reweighting. We demonstrate the effectiveness of IRP with both simulated and real panel choice datasets, and show that `sensible\u27 external information, even if with considerable uncertainty, can improve inferences for quantities of interest. The third study proposes a rank enhanced likelihood (REL) approach to utilize ranking information via re-construction of the likelihood. We demonstrate the effectiveness of REL with simulated datasets, and show that utilizing REL can also improve posterior inferences

    Data-driven cyber attack detection and mitigation for decentralized wide-area protection and control in smart grids

    Get PDF
    Modern power systems have already evolved into complicated cyber physical systems (CPS), often referred to as smart grids, due to the continuous expansion of the electrical infrastructure, the augmentation of the number of heterogeneous system components and players, and the consequential application of a diversity of information and telecommunication technologies to facilitate the Wide Area Monitoring, Protection and Control (WAMPAC) of the day-to-day power system operation. Because of the reliance on cyber technologies, WAMPAC, among other critical functions, is prone to various malicious cyber attacks. Successful cyber attacks, especially those sabotage the operation of Bulk Electric System (BES), can cause great financial losses and social panics. Application of conventional IT security solutions is indispensable, but it often turns out to be insufficient to mitigate sophisticated attacks that deploy zero-day vulnerabilities or social engineering tactics. To further improve the resilience of the operation of smart grids when facing cyber attacks, it is desirable to make the WAMPAC functions per se capable of detecting various anomalies automatically, carrying out adaptive activity adjustments in time and thus staying unimpaired even under attack. Most of the existing research efforts attempt to achieve this by adding novel functional modules, such as model-based anomaly detectors, to the legacy centralized WAMPAC functions. In contrast, this dissertation investigates the application of data-driven algorithms in cyber attack detection and mitigation within a decentralized architecture aiming at improving the situational awareness and self-adaptiveness of WAMPAC. First part of the research focuses on the decentralization of System Integrity Protection Scheme (SIPS) with Multi-Agent System (MAS), within which the data-driven anomaly detection and optimal adaptive load shedding are further explored. An algorithm named as Support Vector Machine embedded Layered Decision Tree (SVMLDT) is proposed for the anomaly detection, which provides satisfactory detection accuracy as well as decision-making interpretability. The adaptive load shedding is carried out by every agent individually with dynamic programming. The load shedding relies on the load profile propagation among peer agents and the attack adaptiveness is accomplished by maintaining the historical mean of load shedding proportion. Load shedding only takes place after the consensus pertaining to the anomaly detection is achieved among all interconnected agents and it serves the purpose of mitigating certain cyber attacks. The attack resilience of the decentralized SIPS is evaluated using IEEE 39 bus model. It is shown that, unlike the traditional centralized SIPS, the proposed solution is able to carry out the remedial actions under most Denial of Service (DoS) attacks. The second part investigates the clustering based anomalous behavior detection and peer-assisted mitigation for power system generation control. To reduce the dimensionality of the data, three metrics are designed to interpret the behavior conformity of generator within the same balancing area. Semi-supervised K-means clustering and a density sensitive clustering algorithm based on Hieararchical DBSCAN (HDBSCAN) are both applied in clustering in the 3D feature space. Aiming to mitigate the cyber attacks targeting the generation control commands, a peer-assisted strategy is proposed. When the control commands from control center is detected as anomalous, i.e. either missing or the payload of which have been manipulated, the generating unit utilizes the peer data to infer and estimate a new generation adjustment value as replacement. Linear regression is utilized to obtain the relation of control values received by different generating units, Moving Target Defense (MTD) is adopted during the peer selection and 1-dimensional clustering is performed with the inferred control values, which are followed by the final control value estimation. The mitigation strategy proposed requires that generating units can communicate with each other in a peer-to-peer manner. Evaluation results suggest the efficacy of the proposed solution in counteracting data availability and data integrity attacks targeting the generation controls. However, the strategy stays effective only if less than half of the generating units are compromised and it is not able to mitigate cyber attacks targeting the measurements involved in the generation control

    Using TB-Sized Data to Understand Multi-Device Advertising

    Get PDF
    In this study, we combine the conversion funnel theory with machine learning methods to understand multi-device advertising. We investigate the important question of how the distribution of ads on multiple devices affects the consumer path to purchase. To handle the sheer volume of TB sized impression data, we develop a MapReduce framework to estimate the non-stationary Hidden Markov Model in parallel. To accommodate the iterative nature of the estimation procedure, we leverage the Apache Spark framework and a corporate cloud computing service. We calibrate the model with hundreds of millions of impressions for 100 advertisers. Our preliminary results show increasing the diversity of device for ads delivery can consistently encourage consumers to become more engaged. In addition, advertiser heterogeneity plays an important role in the variety of the conversion process

    COVID-Net Assistant: A Deep Learning-Driven Virtual Assistant for COVID-19 Symptom Prediction and Recommendation

    Full text link
    As the COVID-19 pandemic continues to put a significant burden on healthcare systems worldwide, there has been growing interest in finding inexpensive symptom pre-screening and recommendation methods to assist in efficiently using available medical resources such as PCR tests. In this study, we introduce the design of COVID-Net Assistant, an efficient virtual assistant designed to provide symptom prediction and recommendations for COVID-19 by analyzing users' cough recordings through deep convolutional neural networks. We explore a variety of highly customized, lightweight convolutional neural network architectures generated via machine-driven design exploration (which we refer to as COVID-Net Assistant neural networks) on the Covid19-Cough benchmark dataset. The Covid19-Cough dataset comprises 682 cough recordings from a COVID-19 positive cohort and 642 from a COVID-19 negative cohort. Among the 682 cough recordings labeled positive, 382 recordings were verified by PCR test. Our experimental results show promising, with the COVID-Net Assistant neural networks demonstrating robust predictive performance, achieving AUC scores of over 0.93, with the best score over 0.95 while being fast and efficient in inference. The COVID-Net Assistant models are made available in an open source manner through the COVID-Net open initiative and, while not a production-ready solution, we hope their availability acts as a good resource for clinical scientists, machine learning researchers, as well as citizen scientists to develop innovative solutions

    Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

    Full text link
    Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks. Although it enlarges the scope of its application, the capacity of the pre-trained model is not fully utilized for the ASR task, and the learned representations may not be optimal for ASR. In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR. Experiments show that compared to wav2vec 2.0, wav2vec-S only requires a marginal increment of pre-training time but could significantly improve ASR performance on in-domain, cross-domain and cross-lingual datasets. Average relative WER reductions are 24.5% and 6.6% for 1h and 10h fine-tuning, respectively. Furthermore, we show that semi-supervised pre-training could close the representation gap between the self-supervised pre-trained model and the corresponding fine-tuned model through canonical correlation analysis.Comment: Accepted by Interspeech 202

    Impacts of DEM resolution and area threshold value uncertainty on the drainage network derived using SWAT

    Get PDF
    Many hydrological algorithms have been developed to automatically extract drainage networks from DEM, and the D8 algorithm is widely used worldwide to delineate drainage networks and catchments. The simulation accuracy of the SWAT model depends on characteristics of the watershed, and previous studies of DEM resolution and its impacts on drainage network extraction have not generally considered the effects of resolution and threshold value on uncertainty. In order to assess the influence of different DEM resolutions and drainage threshold values on drainage network extraction using the SWAT model, 10 basic watershed regions in China were chosen as case studies to analyse the relationship between extracted watershed parameters and the threshold value. SRTM DEM data at 3 different resolutions were used in this study, and regression analysis for DEM resolution, threshold value and extraction effects was done. The results show that DEM resolution influences the selected flow accumulation threshold value; the suitable flow accumulation threshold value increases as the DEM resolution increases, and shows greater variability for basins with lower drainage densities. The link between drainage area threshold value and stream network extraction results was also examined, and showed a variation trend of power function y = axb between the sub-basin counts and threshold value, i.e., the maximum reach length increases while the threshold value increases, and the minimum reach length shows no relation with the threshold value. The stream network extraction resulting from a 250 m DEM resolution and a 50 000 ha threshold value was similar to the real stream network. The drainage network density and the threshold value also shows a trend of power function y = axb ; the value of b is usually 0.5.Keywords: SWAT, digital elevation model (DEM), watershed delineation, threshold valu
    corecore